oracle turing machine
Scalable AI Safety via Doubly-Efficient Debate
Brown-Cohen, Jonah, Irving, Geoffrey, Piliouras, Georgios
The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.
Mirrored Language Structure and Innate Logic of the Human Brain as a Computable Model of the Oracle Turing Machine
As A Computable Model Of The Oracle Turing Machine Han Xiao Wen Weimingbosi Corporation PKU Biocity No. 39 Shang Di Xi Lu, Haidian Beijing, 100085 China We wish to present a mirrored language structure (MLS) and four logic rules determined by this structure for the model of a computable Oracle Turing machine. MLS has novel features that are of considerable biological and computational significance. It suggests an algorithm of relation learning and recognition (RLR) that enables the deterministic computers to simulate the mechanism of the Oracle Turing machine, or P NP in a mathematical term. A concept of mirrored language structure for the human brain has already been proposed by Chomsky [4] as Universal Grammar (UG). His model consists of a hierarchical (deep and surface) dual language structure and a possible set of innate rules.